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DETAILED ACTION 

Remarks 

1 . In response to the Amendment filed on 5 March 2007, claims 1-34 are pending. 

2. The previous claim rejections under 35 USC 112 2nd have been withdrawn. 

3. The rejections under 35 USC 101 fo claims 17, 22, and 27 are maintained 
because the amendment to the preamble of the claims does not make the claim subject 
matter statutory. 

Claim Objections 

4. Claim 22 is objected to because of the following informalities: All dependents on 
claim 22 should sate "the string segmentation schema" instead of only "The 
segmentation schema". Appropriate correction is required. 

Claim Rejections - 35 USC § 101 

5. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

6. Claims 1, 16, 17, 22, and 27 are rejected under 35 U.S.C. 101 because the 
claimed invention is directed to non-statutory subject matter. 
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Claims 1,16, 17, 22, and 27 are directed to software modules/ program code. 
Program code is also known as functional descriptive material (See In re Warmerdam, 
33 F3d at 1360, 31 USPQ2d at 1759). The content is not structurally and functionally 
interrelated to a computer-readable medium thereby rendering it incapable of producing 
a useful, concrete and tangible result and is therefore, non-statutory. The claims should 
be amended to recite hardware in the body of the claims. 

Claim Rejections - 35 USC §112 

7. The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

8. Claims 17, 22, and 27 are rejected under 35 U.S.C. 112, first paragraph, as 
failing to comply with the enablement requirement. The claim(s) contains subject matter 
which was not described in the specification in such a way as to enable one skilled in 
the art to which it pertains, or with which it is most nearly connected, to make and/or use 
the invention. 

Claims 17, 22, and 27 recite contradictory subject matter to that of 
applicant's specification. The claims recite, "wherein the existing collection of data 
records does not comprise manually segmented training data". However, in applicant's 
specification, page 3, lines 2-6, "the present embodiment... does not require explicitly on 
labeled [i.e. segmented] data". The examiner is unable to ascertain what type of 
training data the system uses, for example, automatically segmented data, 
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unsegmented data, or unlabeled data. One of ordinary skill in the art would not know 
how to create or use the invention absent further description. 

Claim Rejections - 35 USC § 103 

9. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

10. Claims 1-34 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Borkaret al. . "Automatic segmentation of text strings into structured records" and in 
view of Ando et al. , "Mostly-Unsupervised Statistical Segmentation of Japanese 
Sequences". 

As to claims 1 and 27, Borkar et al. disclose: 

A process (see Abstract, pg. 1, line 1) and system (see Abstract, pg. 1, paragraph 2, 
line 1 ; wherein DATAMOLD is a system of interrelated components used to segment 
text) to evaluate an input string to segment said input string into component parts 
comprising: 

means for providing a state transition model (see Abstract, pg. 1 , paragraph 2, line 1 
DATAMOLD) based on an existing collection of data records that includes 
probabilities to segment input strings into component parts which adjusts said 
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probabilities to account for token placement in the input string (see pg. 7, section 
2.5.1, lines 16-21); 

means for determining a most probable segmentation (see Abstract, pg. 1, paragraph 2, 
line 1 DATAMOLD) of the input string by comparing an order of tokens that make 
up the input string with a state transition model derived from the collection of data 
records (see pg. 3, section 1.3.1, col. 2, lines 9-11; wherein the inner HMMs 
corroborate each other's findings to pick the segmentation that is globally 
optimal). 

means for segmenting the input string into one or more component parts according to 
the most probable segmentation (see page 4, col. 2, lines 6-9 and 37-38); and 
means for storing the one or more component parts in a data base (see abstract, line 7). 
However, Borkar et al. do not explicitly disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data. 
Ando et al. disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data (see abstract, lines 5-9 and page 2, lines 26-30). 

It would have been obvious to have modified the teachings of Borkar et al. by the 
teachings of Ando et al. to provide a simple, efficient segmentation method thus 
avoiding the costs of hand-segmenting (manually segmenting) training data (see Ando 
et al. . page 2, lines 26-30). 
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As to claims 2 and 28, Borkar et al. . as modified, disclose: 

wherein the state transition model has probabilities for multiple states of said model and 
a most probable segmentation is determined based on a most probable token emission 
path through different states of the state transition model from a beginning state to an 
end state (see Borkar et al. . pg. 4, col. 1, line 3; wherein the HMM has multiple states 
and col. 2, lines 6-9 -path having the highest probability). 

As to claims 3 and 29, Borkar et al. . as modified, disclose: 

means for maintaining a collection of records, wherein the collection of data records is 
stored in a database relation and an order of attributes for the database relation as the 
most probable segmentation is determined (see Borkar et al. . pg. 3, Fig. 1; wherein the 
structured record is determined and produced). 

As to claims 4 and 30, Borkar et al. . as modified, disclose: 

wherein the input string is segmented into sub-components which correspond to 

attributes of the database relation (see Borkar et al. . pg: 1, col. 2, section 1.1, lines 5- 

18). 



As to claims 5 and 31. Borkar et al. . as modified, disclose: 

wherein the tokens are substrings of said input string (see Borkar et al. . pg. 6, section 
2.4, lines 2-4). 
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As to claims 6 and 32, Borkaret al. . as modified, disclose: 

wherein the input string is to be segmented into database attributes and wherein each 
attribute has a state transition model based on the contents of the database relation 
(see Borkaret al. , pg. 4, Fig. 2; wherein each attribute has a transition in the model). 

As to claims 7 and 33, Borkaret al. . as modified, disclose: 
wherein the state transition model has multiple states for a beginning, middle and 
trailing position within an input string (see Borkar et al. . pg. 6, Fig. 6; wherein state "1" is 
the beginning, state "2" is the middle and state "3" is the trailing position). 

As to claims 8 and 34, Borkar etal. . as modified, disclose: 

wherein the state transition model has probabilities for the states and a most probable 
segmentation is determined based on a most probable token emission [state] path 
through different states of the state transition model from a beginning state to an end 
state (see Borkaret al. . pg. 6, Fig 6 and col. 2, paragraph 2, lines 1-4). 

As to claim 9, Borkar et al. . as modified, disclose: 

wherein input attribute order for records to be segmented is known in advance of 
segmentation of an input string (see Borkaret al. . Abstract, pg. 1, paragraph 2, lines 3- 
8). 



As to claim 10, Borkaret al. . as modified, disclose: 
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wherein an attribute order is learned from a batch of records that are inserted into the 
table (see Borkar et aL Abstract, pg. 1, paragraph 2, lines 1-3). 

As to claim 1 1 , Borkar et aL as modified, disclose: 

wherein the state transition model has at least some states corresponding to base 
tokens occurring in the reference relation (see Borkar et aL Abstract, pg. 1, paragraph 
2, lines 1-8; wherein the training examples and dictionary provide the basis for 
acceptable and recognizable input and therefore some states would correspond to the 
same structure/ examples or base tokens). 

As to claim 12, Borkar et aL . as modified, disclose: 

wherein the state transition model has class states corresponding to token patterns 
within said reference relation (see Borkar et aL pg. 3, col. 1, paragraph 3, lines 1-8). 

As to claim 13, Borkar et aL as modified, disclose: 

wherein the state transition model includes states that account for missing, misordered 
and inserted tokens within an attribute (see Borkar etaL pgs. 3-4, section 2; wherein 
data mold uses the example segmented records to output a model that when presented 
with any unseen text segments it into one or more of its constituent elements). 

As to claim 15, Borkar et aL as modified, disclose: 

A machine computer readable medium containing instructions to perform the 
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evaluation] [of] an input string to segment said input string into component parts (see 
Borkar et al. , pg. 1, section 1.1, lines 5-6; wherein the tool is used during warehouse 
construction which implies that the program instructions are being read from a medium 
inserted in or stored on a machine). 

As to claim 17, Borkar et al. disclose: 

a) a database management system to store records organized into relations 

wherein data records within a relation are organized into a number of 
attributes (see page 1, Abstract, line 7 - corporate database); 

b) a model building component that builds a number of attribute recognition 

models based on an existing relation of data records, wherein one or more 
of said attribute recognition models includes probabilities for segmenting 
input strings into component arts which adjusts said probabilities to 
account for erroneous entries within an input string (see page 1 , Abstract, 
lines 13-14; wherein DATAMOLD comprises a model building component 
because its built on HMM; and, (see pg. 7, section 2.5.1, lines 16-21 
accounting for invalid paths); and 

c) a segmenting component that receives an input string and determines a most 

probable record segmentation by evaluating transition probabilities of 
states within the attribute recognition models built by the model building 
component (see page 2, section 1.3, lines 1-3; wherein DATAMOLD 
comprises a segmenting component). 
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However, Borkar et al. do not explicitly disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data. 
Ando et al. disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data (see abstract, lines 5-9 and page 2, lines 26-30). 

It would have been obvious to have modified the teachings of Borkar et al. bv the 
teachings of Ando et al. to provide a simple, efficient segmentation method thus 
avoiding the costs of hand-segmenting (manually segmenting) training data (see Ando 
et al. . page 2, lines 26-30). 

As to claim 18, Borkar et al. . as modified, disclose: 

wherein the segmenting component receives a batch of evaluation strings and 
determines an attribute order of strings in said batch and thereafter assumes the input 
string has tokens in the same attribute order as the evaluation strings (see Borkar et al. . 
Abstract, pg. 1, paragraph 2, lines 3-8; wherein the training examples are the batch of 
strings that provide a basis for the structure of strings). 

As to claim 19, Borkar et al. . as modified, disclose: 

wherein the segmenting component evaluates the tokens in an order in which they are 
contained in the input string and considers state transitions from multiple attribute 
recognition models to find a maximum probability for the state of a token to provide a 
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maximum probability for each token in said input string (see Borkar et al. . pg. 4, section 
2.1 ; wherein the segmenting component considers transitions from the multiple attribute 
states to find the maximum probability). 

As to claim 21 , Borkar et al. . as modified, disclose: 

wherein the model building component defines a start and end state for each model and 
accommodates missing attributes by assigning a probability for a transition from the 
start to the end state (see Borkar et al. . pg. 6, Fig. 6). 

As to claim 22, Borkar et al. disclose: 

a state transition model for a data attribute of a. data record wherein the transition model 
assigns token probabilities to a beginning, middle and trailing state of the model that are 
transitioned to from a start state and terminate with an end state (see Page 6, Fig. 6; 
wherein the state transition model has states for attributes of the input record and the 
edges represent the probabilities to the first (beginning state), second (middle state), 
third (trailing state)) 

However, Borkar et al. do not explicitly disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data. 
Ando et al. disclose: 

wherein the existing collection of data records does not comprise manually segmented 
training data (see abstract, lines 5-9 and page 2, lines 26-30). 
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It would have been obvious to have modified the teachings of Borkar et al. by the 
teachings of Ando et al. to provide a simple, efficient segmentation method thus 
avoiding the costs of hand-segmenting (manually segmenting) training data (see Ando 
et al. . page 2, lines 26-30). 

As to claim 24 Borkar et al. . as modified, disclose: 

wherein the schema includes a state transition models for multiple attributes of a 
database record and one or more of said models provide a transition probability 
between the start state and the end state of each attribute recognition model to 
accommodate missing attributes within an input string (see Borkar et al. . pg. 4, figure 2; 
wherein the model includes states for each attribute in an input string from a database 
record and the edges provide the probabilities between start and end states). 

As to claim 25, Borkar et al. disclose: 

A process of segmenting a string input record into a sequence of attributes for inclusion 
into a database table comprising: 

considering a first token in a string input record and determining a maximum state 

probability for said token based on state transition models for multiple data table 
attributes (see pg. 4, section 2.1; wherein the segmenting component considers 
transitions from the multiple attribute states to find the maximum probability); 

considering in turn subsequent tokens in the string input record and determining 

maximum state probabilities for said subsequent tokens from a previous token 
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state until all tokens are considered (see pg. 4, section 2.1; wherein the 
segmenting component considers transitions from the multiple attribute states to 
find the maximum probability); and 

segmenting the string record by assigning the tokens of the string to attribute states of 
the state transition models corresponding to said maximum state probabilities 
(see pg. 4, Fig. 2, wherein the model displays attributes represented by states 
and section 2.1 ; wherein the segmenting component considers transitions from 
the multiple attribute states to find the maximum probability . 

However, Borkar et al. do not explicitly disclose: 

wherein the state transition models are based on an existing collection of data records 
that does not comprise manually segmented training data. 
Ando et al. disclose: 

wherein the state transition models are based on an existing collection of data records 
(sequences) that does not comprise manually segmented training data (see abstract, 
lines 5-9 and page 2, lines 26-30). 

It would have been obvious to have modified the teachings of Borkar et al. by the 
teachings of Ando et al. to provide a simple, efficient segmentation method thus 
avoiding the costs of hand-segmenting (manually segmenting) training data (see Ando 
et al. . page 2, lines 26-30). 

As to claim 26, Borkar et al. . as modified, disclose: 
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additionally comprising determining an attribute order for a batch of string input records 
and using the order to limit the possible state probabilities when evaluating tokens in an 
input string (see Borkar et al. . Abstract, pg. 1, paragraph 2, lines 1-3; wherein the 
structure and order] is learned from the training examples). 

1 1 . Claims 14, 20, and 23 are rejected under 35 U.S.C. 1 03(a) as being 
unpatentable over Borkar etal. ; "Automatic segmentation of text strings into structured 
records", in view of Ando et al. , "Mostly-Unsupervised Statistical Segmentation of 
Japanese Sequences", and further in view of Reed (U.S. Pat. No. 5, 095, 432). 

As to claim 14, Borkar et al. and Ando et al. . do not explicitly disclose: 
wherein the state transition model has a beginning, a middle and a trailing state 
topology and the process of accounting for misordered and inserted tokens is performed 
by copying states from one of said beginning, middle or trailing states into another of 
said beginning, middle or trailing states. 
However, Reed discloses: 

wherein the state transition model has a beginning, a middle and a trailing state 
topology and the process of accounting for misordered and inserted tokens is performed 
by copying states from one of said beginning, middle or trailing states into another of 
said beginning, middle or trailing states (see col. 5, lines 1). 

It would have been obvious, at the time of the invention, having the teachings of 
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Borkar et al. . Ando et at. , and Reed before him/her, to combine the steps as disclosed 
by Borkar et al. and Ando et al. with the feature as disclosed by Reed to enable 
grammar developers to use the familiar PSG formalism to compile their grammars into 
RVG for more efficient execution (see Reed , col. 2, lines 54-57). 

As to claims 20 and 23, Borkar et al. and Ando et al. . do not explicitly disclose: 
wherein the model building component assigns states for each attribute for a beginning, 
middle and trailing token position (see pg. 4, Fig. 2; wherein the states are assigned to 
each attribute and pg. 6, Fig. 6; wherein states are assigned for first (beginning state), 
second (middle state), third (trailing state)) 
However, Borkar et al. does not explicitly disclose: 

wherein the model building component relaxes token acceptance by the model by 
copying states among said beginning, middle and trailing token positions. 
Reed discloses: 

wherein the model building component relaxes token acceptance by the model by 
copying states among said beginning, middle and trailing token positions (see col. 5, 
lines 1; wherein states in the transition model are copied). 

It would have been obvious, at the time of the invention, having the teachings of 
Borkar et al. . Ando et al. . and Reed before him/her, to combine the steps as disclosed 
by Borkar et al. and Ando et al. with the feature as disclosed by Reed to enable 
grammar developers to use the familiar PSG formalism to compile their grammars into 
RVG for more efficient execution (see Reed , col. 2, lines 54-57). 



Application/Control Number: 10/825,488 



Art Unit: 2166 



Page 16 



12. Claim 16 is rejected under 35 U.S.C. 103(a) as being unpatentable over Borkar 
et al. : "Automatic segmentation of text strings into structured records" in view of Ando 
et al. . "Mostly-Unsupervised Statistical Segmentation of Japanese Sequences", and 
further in view of Fairweather (U.S. PG. Pub. No. 2006/0235811). 

As to claim 16, Borkar et al. disclose: 

providing a reference table of string records that are segmented into multiple substrings 
corresponding to database attributes (see Abstract, p. 1, paragraph 2, lines 1-3); 

breaking the input record into a sequence of tokens, and determining a most probable 
segmentation of the input record by comparing the tokens of the input record with 
state models derived for attributes from the reference table (see pg. 3, section 
1 .3.1 , col. 2, lines 9-1 1 ; wherein the inner HMMs corroborate each other's 
findings to pick the segmentation that is globally optimal). 

However, Borkar et al. does not explicitly disclose: 

wherein the reference table of string records does not comprise manually segmented 
training data. 

analyzing the substrings within an attribute to provide a state model that assumes a 
beginning, a middle and a trailing token topology for said attribute said topology 
including a null token for an empty attribute component; 

Ando et al. disclose: 

wherein the reference table of string records (sequences) does not comprise manually 
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segmented training data, (see abstract, lines 5-9 and page 2, lines 26-30). 

It would have been obvious to have modified the teachings of Borkar et al. by the 
teachings of Ando et al. to provide a simple, efficient segmentation method thus 
avoiding the costs of hand-segmenting (manually segmenting) training data (see Ando 
et al. . page 2, lines 26-30). 

However, Borkar et al. and Ando et al. does not explicitly disclose: 

analyzing the substrings within an attribute to provide a state model that assumes a 

beginning, a middle and a trailing token topology for said attribute said topology 

including a null token for an empty attribute component 
Fairweather discloses: 

analyzing the substrings within an attribute to provide a state model that assumes a 
beginning, a middle and a trailing token topology for said attribute said topology 
including a null token for an empty attribute component (see Fairweather. 
paragraph [0406], lines 8-9; wherein a the null pointer is returned because the 
token is null); 

It would have been obvious, at the time of the invention, having the teachings of 
Borkar eta I.. Ando et al. . and Fairweather before him/her, to combine the steps as 
disclosed by Borkar et al. and Ando et al. with the feature as disclosed by Fairweather 
to provide a system in which the content of the data itself actually determines the order 
of execution of statements in the mining language and automatically keeps track of the 
current state (see Fairweather . paragraph [0004], lines 7-10). 
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Response to Arguments 

13. Applicant's arguments with respect to claims 1, 14, 16, 17, 20, 22, 23, 25, and 27 
have been considered but are moot in view of the new ground(s) of rejection. 

With respect to claims 1, 14, 16, 17, 20, 22, 23, 25, and 27, applicant argues that 
none of the cited references disclose or suggest the element "wherein the existing 
collection of data records does not comprise manually segmented training data". 
Applicant's arguments are contradictory to the actual elements of the specification. 
Applicant argues that the embodiments of the present application are directed to 
"...unsupervised text segmentation utilizing a reference table or relation that does not 
require explicitly labeled (i.e., segmented) training data ... (see specification page 3, 
lines 2-6". The act of not requiring manually segmented training data is not the same as 
not requiring any segmented data. The examiner suggests the applicant claim what his 
specification teaches. 

The examiner notes applicant's remarks regarding claim 29. the claimed 
limitations are similar subject matter to that of claim 3 and are rejected under the same 
rationale. 

Conclusion 

14. Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
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§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

1 5. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Johnese Johnson whose telephone number is 571-270- 
1097. The examiner can normally be reached on 4/5/9. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 





JJ 



